fix(logs): add durable execution diagnostics foundation by PlaneInABottle · Pull Request #3564 · simstudioai/sim

PlaneInABottle · 2026-03-13T10:57:07Z

Summary

persist durable execution diagnostics in workflow_execution_logs, including lastStartedBlock, lastCompletedBlock, trace metadata, and finalizationPath
centralize terminal execution finalization so completed, failed, cancelled, and paused runs keep consistent diagnostics without letting callback failures change execution outcomes
add focused regression coverage for diagnostics derivation, logging-session durability, and executor finalization ordering

Why

Later jobs and log read-surface fixes depend on a trustworthy execution diagnostics foundation. This PR stores the minimum durable data needed to explain where a run got to and how it ended without pulling in broader API or jobs-surface changes.

Scope

included: logging-session persistence, execution log shape updates, executor finalization ordering, and focused tests
excluded: jobs route changes, logs API read-surface changes, paused status normalization across APIs, cleanup routes, and webhook/async handoff UI work

Validation

bun --cwd apps/sim vitest run lib/logs/execution/diagnostics.test.ts lib/logs/execution/logger.test.ts lib/logs/execution/logging-session.test.ts lib/workflows/executor/execution-core.test.ts
bunx @biomejs/biome check apps/sim/lib/workflows/executor/execution-core.ts apps/sim/lib/workflows/executor/execution-core.test.ts apps/sim/lib/logs/execution/logging-session.ts apps/sim/lib/logs/execution/logging-session.test.ts apps/sim/lib/logs/execution/logger.ts apps/sim/executor/orchestrators/loop.ts apps/sim/executor/orchestrators/parallel.ts apps/sim/executor/orchestrators/node.ts apps/sim/executor/utils/subflow-utils.ts apps/sim/executor/execution/block-executor.ts
verified latest local execution rows directly in Postgres include finalizationPath, lastStartedBlock, and lastCompletedBlock

Follow-ups

centralize execution status contract for read surfaces
normalize paused execution status across read APIs
reconcile jobs async status with execution truth and expose handoff state

cursor · 2026-03-13T10:57:13Z

PR Summary

Medium Risk
Touches executor orchestration and execution-finalization/log persistence paths, which can affect run outcomes and logging correctness if sequencing or async behavior is wrong. Changes are guarded with try/catch and added tests, but include new DB writes and async callback awaiting that may impact timing.

Overview
Adds durable execution diagnostics to workflow_execution_logs, including lastStartedBlock, lastCompletedBlock, trace span presence/count, and a finalizationPath/completionFailure to explain how a run ended.

Refactors lifecycle signaling across BlockExecutor, loop/parallel orchestrators, and subflow-utils to await onBlockStart/onBlockComplete (including empty-subflow events) while swallowing callback failures so they don’t change execution outcomes.

Updates LoggingSession to persist last-started/last-completed markers via monotonic JSONB updates, track/drain pending “progress” writes before terminal completion, and ensure fallback completions preserve successful outputs and accumulated cost; ExecutionLogger now carries through existing diagnostics and computes trace span counts. Adds focused regression tests for ordering, fallback semantics, and empty-parallel lifecycle behavior.

^{Written by Cursor Bugbot for commit 767775c. Configure here.}

vercel · 2026-03-13T10:57:15Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
docs	Skipped		Mar 18, 2026 0:00am

apps/sim/lib/logs/execution/logger.ts

apps/sim/lib/logs/execution/logging-session.ts

greptile-apps · 2026-03-13T11:05:14Z

Greptile Summary

This PR establishes a durable execution diagnostics foundation by persisting lastStartedBlock, lastCompletedBlock, finalizationPath, and completionFailure into workflow_execution_logs, and by converting the previously fire-and-forget post-execution logging in execution-core.ts into an awaited, centralized finalization step.

Key changes:

logging-session.ts: Adds onBlockStart / onBlockComplete lifecycle hooks that issue monotonic JSONB jsonb_set writes per block, tracked in a pendingProgressWrites set that is drained before any terminal completion call.
execution-core.ts: Replaces fire-and-forget void (async () => {})() finalization with await finalizeExecutionOutcome(...) / await finalizeExecutionError(...), ensuring DB writes complete before the function returns; introduces wasExecutionFinalizedByCore to prevent double-logging by outer callers.
All orchestrators (block-executor, loop, parallel, node) and subflow-utils are updated to make their onBlockStart / onBlockComplete callbacks async and wrapped in try/catch so callback failures never break execution.
diagnostics.ts (new): Provides buildExecutionDiagnostics for the upcoming read-surface — currently covered by tests but not yet wired to any route.
types.ts: Adds ExecutionFinalizationPath, ExecutionLastStartedBlock, and ExecutionLastCompletedBlock types with a runtime type guard.

Notable behavior change: The response from executeWorkflowCore is now delayed until DB finalization writes complete, trading a small latency increase for guaranteed diagnostic durability.

Confidence Score: 4/5

Safe to merge — no critical logic bugs found; changes are well-tested and isolated to the logging/diagnostics layer.
The refactoring is thorough, backed by focused regression tests covering ordering invariants, retry semantics, and callback isolation. The main behavioral change (awaiting DB writes before returning) is intentional and correct. The noted issues are minor: countTraceSpans duplication, a snapshot-based drain that could theoretically miss late writes (practically safe given the execution model), a changed import style in logger.test.ts, and fire-and-forget cost flushes that are not tracked in pendingProgressWrites (safe because the terminal write uses in-memory cost). No runtime errors or data-loss scenarios were identified.
Pay close attention to apps/sim/lib/logs/execution/logging-session.ts (drain semantics and cost flush fire-and-forget) and apps/sim/lib/workflows/executor/execution-core.ts (new awaited finalization path).

Important Files Changed

Filename	Overview
apps/sim/lib/workflows/executor/execution-core.ts	Major refactor: replaces fire-and-forget post-execution logging with awaited `finalizeExecutionOutcome`/`finalizeExecutionError` helpers; adds `wrappedOnBlockStart` that awaits persistence before firing user callbacks as void; exports `wasExecutionFinalizedByCore` for double-finalization prevention. Behavior change: response latency increases slightly as DB writes are now awaited before returning, which is the correct tradeoff for reliability.
apps/sim/lib/logs/execution/logging-session.ts	Substantial additions: `onBlockStart`/`pendingProgressWrites` tracking, monotonic JSONB update queries for `lastStartedBlock`/`lastCompletedBlock`, `drainPendingProgressWrites` before terminal finalization, centralized `completeExecutionWithFinalization`, and `completionPromise` clearing on failure to allow error-path retry. Minor concern: snapshot-based drain could theoretically miss late-registered writes; `flushAccumulatedCost` is now fire-and-forget without being tracked in `pendingProgressWrites`.
apps/sim/lib/logs/execution/logger.ts	Added `buildCompletedExecutionData` helper that merges `lastStartedBlock`, `lastCompletedBlock`, `finalizationPath`, `completionFailure`, and trace metadata from existing DB data and new params. Introduces a duplicate `countTraceSpans` (also in `diagnostics.ts`). The `completeWorkflowExecution` signature is expanded with `finalizationPath` and `completionFailure` params.
apps/sim/lib/logs/execution/diagnostics.ts	New utility for deriving execution diagnostics from existing `executionData` (DB read path). Handles untyped data safely with runtime checks; validates `finalizationPath` using the new type guard. Currently unused in the main codebase (only in tests) — serves as foundation for upcoming read-surface changes.
apps/sim/lib/logs/types.ts	Adds `ExecutionFinalizationPath` const enum with type guard, `ExecutionLastStartedBlock`/`ExecutionLastCompletedBlock` interfaces, and extends `WorkflowExecutionLog['executionData']` with the new diagnostic fields. Clean additions with no breaking changes to existing callers.
apps/sim/executor/execution/block-executor.ts	Made `callOnBlockStart` and `callOnBlockComplete` async with try/catch wrappers so callback failures are logged but never bubble up to break block execution. Straightforward and safe change.
apps/sim/executor/utils/subflow-utils.ts	Updated `addSubflowErrorLog` and `emitEmptySubflowEvents` to use `void promise.catch()` pattern for `onBlockStart`/`onBlockComplete` since these are synchronous utility functions. Writes are still registered in `pendingProgressWrites` synchronously before first suspension, so drain semantics are preserved.

Sequence Diagram

sequenceDiagram
    participant EC as executeWorkflowCore
    participant LS as LoggingSession
    participant DB as Database
    participant EX as Executor

    EC->>LS: safeStart()
    EC->>EX: execute() with wrappedOnBlockStart/Complete

    loop For each block
        EX->>EC: wrappedOnBlockStart(blockId, ...)
        EC->>LS: onBlockStart(blockId, startedAt)
        LS->>DB: jsonb_set lastStartedBlock (monotonic)
        DB-->>LS: ack (tracked in pendingProgressWrites)
        LS-->>EC: resolved
        EC-->>EX: void userCallback fired separately

        EX->>EC: wrappedOnBlockComplete(blockId, output)
        EC->>LS: onBlockComplete(blockId, output)
        LS->>DB: jsonb_set lastCompletedBlock (monotonic)
        LS->>DB: void flushAccumulatedCost (fire-and-forget)
        LS-->>EC: resolved
        EC-->>EX: void userCallback fired separately
    end

    EX-->>EC: ExecutionResult

    EC->>EC: finalizeExecutionOutcome()
    EC->>LS: safeComplete / safeCompleteWithCancellation / safeCompleteWithPause
    LS->>LS: drainPendingProgressWrites()
    LS->>DB: completeWorkflowExecution (finalizationPath, lastStarted/CompletedBlock, traceSpans)
    DB-->>LS: ack
    LS-->>EC: resolved

    EC->>DB: clearExecutionCancellation
    EC->>DB: updateWorkflowRunCounts
    EC-->>Caller: ExecutionResult

_{Last reviewed commit: 9db5e87}

apps/sim/lib/logs/execution/logger.ts

apps/sim/lib/logs/execution/logging-session.ts

apps/sim/lib/logs/execution/logger.test.ts

apps/sim/lib/logs/execution/logging-session.ts

PlaneInABottle · 2026-03-13T11:12:40Z

@icecrasher321 I think this is an important one. One scenario I have experienced is stuck workflow. I couldn't find any logs, and it just kept in running. I am planning to introduce few more prs after this one too.

Store last-started and last-completed block markers with finalization metadata so later read surfaces can explain how a run ended without reconstructing executor state.

Await only the persistence needed to keep diagnostics durable before terminal completion while keeping callback failures from changing execution behavior.

Keep successful fallback output and accumulated cost intact while tightening progress-write draining and deduplicating trace span counting for diagnostics helpers.

Add the missing AuthType export to the hybrid auth mock so the async execution route test exercises the 202 queueing path instead of crashing with a 500 in CI.

apps/sim/executor/execution/block-executor.ts

apps/sim/lib/logs/execution/logging-session.ts

apps/sim/lib/logs/execution/diagnostics.ts

Allow same-millisecond marker writes to replace prior markers and drop the unused diagnostics read helper so this PR stays focused on persistence rather than unread foundation code.

apps/sim/lib/logs/types.ts

Drop the unused helper so this PR only ships the persistence-side status types it actually uses.

apps/sim/executor/utils/subflow-utils.ts

Ensure empty-subflow and subflow-error lifecycle callbacks participate in progress-write draining before terminal finalization while still swallowing callback failures.

icecrasher321 · 2026-03-14T19:43:48Z

@icecrasher321 I think this is an important one. One scenario I have experienced is stuck workflow. I couldn't find any logs, and it just kept in running. I am planning to introduce few more prs after this one too.

Cool yeah, I'll review these. Thanks.

icecrasher321 · 2026-03-14T19:45:11Z

bugbot run

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

icecrasher321

Tested various cases: API execs, HITL, Manual. No ordering issues and the pattern makes sense.

* fix(mothership): fix mothership file uploads (#3640) * Fix files * Fix * Fix * fix(workspace): prevent stale placeholder data from corrupting workflow registry on switch * feat(csp): allow chat UI to be embedded in iframes (#3643) * feat(csp): allow chat UI to be embedded in iframes Mirror the existing form embed CSP pattern for chat pages: add getChatEmbedCSPPolicy() with frame-ancestors *, configure /chat/:path* headers in next.config.ts without X-Frame-Options, and early-return in proxy.ts so chat routes skip the strict runtime CSP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(csp): extract shared getEmbedCSPPolicy helper Deduplicate getChatEmbedCSPPolicy and getFormEmbedCSPPolicy into a shared private helper to prevent future divergence. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(logs): add durable execution diagnostics foundation (#3564) * fix(logs): persist execution diagnostics markers Store last-started and last-completed block markers with finalization metadata so later read surfaces can explain how a run ended without reconstructing executor state. * fix(executor): preserve durable diagnostics ordering Await only the persistence needed to keep diagnostics durable before terminal completion while keeping callback failures from changing execution behavior. * fix(logs): preserve fallback diagnostics semantics Keep successful fallback output and accumulated cost intact while tightening progress-write draining and deduplicating trace span counting for diagnostics helpers. * fix(api): restore async execute route test mock Add the missing AuthType export to the hybrid auth mock so the async execution route test exercises the 202 queueing path instead of crashing with a 500 in CI. * fix(executor): align async block error handling * fix(logs): tighten marker ordering scope Allow same-millisecond marker writes to replace prior markers and drop the unused diagnostics read helper so this PR stays focused on persistence rather than unread foundation code. * fix(logs): remove unused finalization type guard Drop the unused helper so this PR only ships the persistence-side status types it actually uses. * fix(executor): await subflow diagnostics callbacks Ensure empty-subflow and subflow-error lifecycle callbacks participate in progress-write draining before terminal finalization while still swallowing callback failures. --------- Co-authored-by: test <test@example.com> Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai> * feat(admin): add user search by email and ID, remove table border - Replace Load Users button with a live search input; query fires on any input - Email search uses listUsers with contains operator - User ID search (UUID format) uses admin.getUser directly for exact lookup - Remove outer border on user table that rendered white in dark mode - Reset pagination to page 0 on new search Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(admin): replace live search with explicit search button - Split searchInput (controlled input) from searchQuery (committed value) so the hook only fires on Search click or Enter, not every keystroke - Gate table render on searchQuery.length > 0 to prevent stale results showing after input is cleared Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Siddharth Ganesan <33737564+Sg312@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: PlaneInABottle <y.mirza.altay@gmail.com> Co-authored-by: test <test@example.com> Co-authored-by: Vikhyath Mondreti <vikhyath@simstudio.ai>

vercel bot temporarily deployed to Preview March 13, 2026 10:57 Inactive

cursor bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/lib/logs/execution/logger.ts Show resolved Hide resolved

apps/sim/lib/logs/execution/logging-session.ts Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/lib/logs/execution/logger.ts Show resolved Hide resolved

apps/sim/lib/logs/execution/logging-session.ts Show resolved Hide resolved

apps/sim/lib/logs/execution/logger.test.ts Show resolved Hide resolved

apps/sim/lib/logs/execution/logging-session.ts Outdated Show resolved Hide resolved

test added 3 commits March 13, 2026 14:17

fix(logs): persist execution diagnostics markers

1e2fbab

Store last-started and last-completed block markers with finalization metadata so later read surfaces can explain how a run ended without reconstructing executor state.

fix(executor): preserve durable diagnostics ordering

2901db4

Await only the persistence needed to keep diagnostics durable before terminal completion while keeping callback failures from changing execution behavior.

fix(logs): preserve fallback diagnostics semantics

c6d9195

Keep successful fallback output and accumulated cost intact while tightening progress-write draining and deduplicating trace span counting for diagnostics helpers.

PlaneInABottle force-pushed the upstream/execution-diagnostics-foundation branch from 9db5e87 to c6d9195 Compare March 13, 2026 11:44

vercel bot temporarily deployed to Preview March 13, 2026 11:44 Inactive

fix(api): restore async execute route test mock

3083a88

Add the missing AuthType export to the hybrid auth mock so the async execution route test exercises the 202 queueing path instead of crashing with a 500 in CI.

vercel bot temporarily deployed to Preview March 13, 2026 11:51 Inactive

cursor bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/executor/execution/block-executor.ts Show resolved Hide resolved

fix(executor): align async block error handling

ab6aa29

vercel bot temporarily deployed to Preview March 13, 2026 12:13 Inactive

cursor bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/lib/logs/execution/logging-session.ts Outdated Show resolved Hide resolved

apps/sim/lib/logs/execution/diagnostics.ts Outdated Show resolved Hide resolved

fix(logs): tighten marker ordering scope

44f4ad9

Allow same-millisecond marker writes to replace prior markers and drop the unused diagnostics read helper so this PR stays focused on persistence rather than unread foundation code.

vercel bot temporarily deployed to Preview March 13, 2026 12:42 Inactive

cursor bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/lib/logs/types.ts Outdated Show resolved Hide resolved

fix(logs): remove unused finalization type guard

b4f0ced

Drop the unused helper so this PR only ships the persistence-side status types it actually uses.

vercel bot temporarily deployed to Preview March 13, 2026 12:56 Inactive

cursor bot reviewed Mar 13, 2026

View reviewed changes

apps/sim/executor/utils/subflow-utils.ts Show resolved Hide resolved

fix(executor): await subflow diagnostics callbacks

c3f5d77

Ensure empty-subflow and subflow-error lifecycle callbacks participate in progress-write draining before terminal finalization while still swallowing callback failures.

vercel bot temporarily deployed to Preview March 13, 2026 13:15 Inactive

Merge upstream/staging into upstream/execution-diagnostics-foundation

767775c

vercel bot temporarily deployed to Preview March 14, 2026 18:02 Inactive

icecrasher321 self-assigned this Mar 14, 2026

cursor bot reviewed Mar 14, 2026

View reviewed changes

PlaneInABottle mentioned this pull request Mar 15, 2026

Async executions can stay stuck in running with missing final logs #3518

Open

Merge branch 'staging' into upstream/execution-diagnostics-foundation

d77bc98

vercel bot temporarily deployed to Preview March 18, 2026 00:00 Inactive

icecrasher321 approved these changes Mar 18, 2026

View reviewed changes

icecrasher321 merged commit 67478bb into simstudioai:staging Mar 18, 2026
11 checks passed

waleedlatif1 mentioned this pull request Mar 18, 2026

v0.6.2: mothership stability, chat iframe embedding, KB upserts, new blog post #3650

Merged

Conversation

PlaneInABottle commented Mar 13, 2026

Summary

Why

Scope

Validation

Follow-ups

Uh oh!

cursor bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Summary

Uh oh!

vercel bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 13, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PlaneInABottle commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

icecrasher321 commented Mar 14, 2026

Uh oh!

icecrasher321 commented Mar 14, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

icecrasher321 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cursor bot commented Mar 13, 2026 •

edited

Loading

vercel bot commented Mar 13, 2026 •

edited

Loading